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How do users behave in online chatrooms, where they instantaneously read and write posts? We analyzed 
about 2.5 million posts covering various topics in Internet relay channels, and found that user activity 
patterns follow known power-law and stretched exponential distributions, indicating that online chat 
activity is not different from other forms of communication. Analysing the emotional expressions (positive, 
negative, neutral) of users, we revealed a remarkable persistence both for individual users and channels. I.e. 
despite their anonymity, users tend to follow social norms in repeated interactions in online chats, which 
results in a specific emotional "tone" of the channels. We provide an agent-based model of emotional 
interaction, which recovers qualitatively both the activity patterns in chatrooms and the emotional 
persistence of users and channels. While our assumptions about agent's emotional expressions are rooted in 
psychology, the model allows to test different hypothesis regarding their emotional impact in online 
communication. 
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How do human communication patterns change on the Internet? Round the clock activities of Internet users 
put us into the comfortable situation of having massive data from various sources available at a fine time 
resolution. But what to look at? Which aggregated measures are most appropriate to capture how new 
technologies affect our communicative behavior? And then, are we able to match these findings with a dynamic 
model that is able to generate insights into their origin? In this paper, we provide both: a new way of analysing data 
from online chats, and a model of interacting agents to reproduce the stylized facts of our analysis. In addition to 
the activity patterns of users, we also analyse and model their emotional expressions that trigger the interactions 
of users in online chats. Validating our agent-based model against empirical findings allows us to draw conclu- 
sions about the role of emotions in this form of communication. 

Online communication can be seen as a large-scale social experiment that constantly provides us with data 
about user activities and interactions. Consequently, time series analyses have already revealed remarkable tem- 
poral activity patterns, e.g. in email communication. Such patterns allow conclusions how humans organize their 
time and give different priorities to their communication tasks 13 ' 5 7 . One particular quantity to describe these 
patterns is the distribution P(t) of the waiting time t that elapses before a particular user answers e.g. an email. 
Different studies have confirmed the power-law nature of this distribution, P(t) ~ t~". Its origin was attributed 
either to the burstiness of events 2 or to circadian activity patterns 3 , while a recent work shows that a combination 
of both effects is also a plausible scenario 4 . However, the value of the exponent a is still debated. A stochastic 
priority queue model 6 allows to derive a by comparing two different rates, the average rate ). of messages 
arriving and the average rate ji of processing messages. If /( £ 1, i.e. if messages arrive faster than they can be 
processed, a = 3/2 was found, which is compatible with most empirical findings and simulation models 1 " 3,8 . 
However, in the opposite case, s a, i.e. if messages can be processed upon arrival, a = 5/2 was found together 
with an exponential correction term. The latter regime, also denoted as the "highly attentive regime", could be 
verified empirically so far only by using data about donations 7 . So, it is an interesting question to analyze other 
forms of online communication to see whether there is evidence for the second regime. 

In this paper, we analyze data about instant online communication in different chatting communities, spe- 
cifically Internet Relay Chat (IRC) channels, where each channel covers a particular topic. Prior to the very 
common social networking sites of today, IRC channels provided a safe and independent way for users to share 
and discuss information outside traditional media. Different from other types of online communication, such as 
blogs or fora where entries are posted at a given time (decided by the writer), IRC chats are instantaneous in real 
time, i.e. users read while the post is written and can react immediately. This type of interaction requires much 
higher user activity in comparison to persistent communication e.g. in fora. Further, it is more spontaneous, often 
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leading to emotionally-rich communication between involved peers. 
Consequently, instant communication should require specific tools 
and models for analysis, that are capable of covering these predom- 
inant features. 

Nowadays, IRC channels are still one of the most used platforms 
for collective real-time online communication and are used for vari- 
ous purposes, e.g. organization of open-source project development, 
Internet activism, dating, etc. Our dataset (described in detail in the 
data section), consists of 20 IRC channels covering topics as diverse 
as music, sports, casuals chats, business, politics, or computer related 
issues - which is important to ensure that there is no topical bias 
involved in our analysis. For each channel, we have consecutive daily 
recordings of the open discussion over a period of 42 days, which 
amounts to more than 2.5 million posts in total generated by more 
than 20.000 different users. 

We process our analysis as follows: first, we look into the com- 
munication patterns of instant online discussions, to find out about 
the average response time of users and its possible dependence on the 
topics discussed. This shall allow us to identify differences between 
instantaneous chatting communities and other forms of slower, per- 
sistent communication. In a second step, we look more closely into 
the content of the discussions and how they depend on the emotions 
expressed by users. Remarkably, we find that most users are very 
persistent in expressing their positive or negative emotions - which 
is not expected given the variety of topics and the user anonymity. 
This leads us to the question in what respect online chats are different 
from offline discussions which are mostly guided by social norms. 
We argue that even in instantaneous, anonymous online chats users 
behave very much like "normal" people. Our quantitative insights 
into user's activity patters and their emotional expressions are 
eventually combined to model interacting emotional agents. We 



demonstrate that the stylised facts of the emotional persistence can 
be reproduced by our model by only calibrating a small set of agent 
features. This success indicates that our modeling framework can be 
used to test further hypothesis about emotional interaction in online 
communities. 

Results 

User activity patterns. An IRC channel is always active, and enables 
the real time exchange of posts among users about a specific topic. 
User interaction is instantaneous, the post written by user U\ is 
immediately visible to all other users logged into this channel, and 
user u 2 may reply right away. Fig. 1 illustrates the dynamics in such a 
channel. As time evolves new users may enter, others may leave or 
stay quiet until they write follow-up posts at a later time. 

To characterize these activity patterns, we analyzed the waiting- 
time, or inter- activity time distribution P(t), where x refers to the 
time interval between two consecutive posts of the same user in the 
same channel and ask about the average response time. We find that 
t is power-law distributed P(z) ~ z~* with some cut-off (Fig. IB), 
with an exponent i = 1.53 ± 0.02. The fit is based on the maximum 
likelihood approach proposed by Clauset et al. 9 and the power-law 
nature of the distribution could not be rejected (p = 0.375). 

This finding (a) is inline the power-law distribution already found 
for diverse human activities' 3-5-7 and (b) classifies the communica- 
tion process as belonging to the regime where posts arrive faster than 
they can be processed. We note that for a < 2, no average response 
time is defined (which would have been the case, however, for the 
highly attentive regime). Further, we observe in the plot of Fig. IB a 
slight deviation from the power-law at a time interval of about one 
day, which shows that some users have an additional regularity in 
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Figure 1 | Communication activity over an IRC channel. A) Schema of the evolution of a conversation in an IRC channel. At every time step, a user 
enters a post expressing a positive, negative, or neutral emotion. B) Probability distribution of the user activity over all the IRC channels. The activity is 
expressed as the time interval t between two consecutive posts of the same user. Inset: Probability distribution of the user activity for individual IRC 
channels. The time is measured in minutes. C) Scaled probability distribution of the time interval co c h between consecutive posts entered in all the 20 IRC 
channels. The solid line represents stretched exponential fit to the data. Inset: Probability distribution of the time interval co c h between consecutive posts 
entered in all the 20 IRC channels without rescaling. The time is measured in minutes. 
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their behavior with respect to the time of the day they enter the online 
discussion. Such deviations were usually treated as power-laws with an 
exponential cut-off, and can even be explained based on simple entro- 
pic arguments 1011 . However, because of the "bump" around the one day 
time interval, our distribution also seems to provide further evidence to 
the bi-modality proposed by Wu et al. 12 . We should note, however, that 
the tail is better fitted by a log- normal distribution (KS=0.136) rather 
than an exponential (KS=0.190) or a Weibull (KS=0.188) one (again 
using the maximum likelihood methodology described by Clauset 
et al. 9 ) as shown in Fig. IB. Here, KS stands for the Kolmogorov- 
Smirnov statistical test; the smaller this number, the better the fit. 

We now focus on an important difference between online chats 
and previously studied forms of communication, such as mail or 
email exchange, which mostly involve two participants. Due to the 
collective nature of chats, a chatroom automatically aggregates the 
posts of a much larger amount of users, which allows us to study their 
collective temporal behavior. If a denotes the time interval between 
two consecutive posts in the same channel independent of any user 
(also denoted as inter-event time, and to be distinguished from the 
inter-activity time characterizing a single user), we find that the 
distribution P(oj) is is still fat-tailed, but does not follow a power- 
law. Interestingly, the time interval between posts significantly 
depends on the topic discussed in the channel (Inset of Fig. 1C). 
Some "hot" topics receive posts at a shorter rate than others, which 
can be traced back to the different number of users involved into 
these discussions. Specifically, we find that the average inter-event 
time (co) c h depends on the amount of users in the conversation and 
becomes smaller for more popular channels, as one would expect. 

If we rescale the channel dependent inter-event distribution 
P ch (cu) using the average inter-event time (oj) ch per channel and plot 
(co ch ) P c h(oj c h) versus w ch /(cu ch ), we find that all the curves collapse 
into one master curve (Fig. 1C). The general scaling form that we 
used is P(oj) = (1/<co>)F(co/<oj>), where F(x) is independent of 
the average activity level of the component, and represents a univer- 
sal characteristic of the particular system. Such scaling behavior was 
reported previously in the literature describing universal patterns 
in human activity 13 . We fit this master curve by a stretched exponen- 
tial 14 - 16 

P(co) = -^e~ P ^ (1) 
(CO) 

where the stretched exponent y is the only fit parameter, while the 
other two factors a., and ft, are dependent on y 14 . A histogram of the y 
values across the 20 channels is shown in Supplementary Figure S2. 
Using only the regression results with p < 0.001 we find that the 
mean value of the stretched exponents is (y) = 0.21 ± 0.05. 

We note that stretched exponentials have been reported to 
describe the inter-event time distribution in systems as diverse as 
earthquakes 15 and stock markets 16 . These systems commonly exhibit 
long range correlations which seem to be the origin of the stretched 
exponential inter-event time distributions 14 . Long range correlations 
have also been reported in human interaction activity 517 , and we 
tested their presence in the temporal activity over IRC communica- 
tion. As shown in the Supplementary Figure S3, we verified the 
existence of long range correlations in the conversation activity. 
We found that the decay of the autocorrelation function of the 
inter-event time interval between consecutive posts within a channel 
is described by a power-law 

C(At)~(Af)-*° (2) 

with exponent v (0 ~ 0.82. In addition, we applied the Detrended 
Fluctuation Analysis (DFA) technique 18 , described in detail in the 
Methods section, and we found a Hurst exponent value, if (u ~0.6, 
which is well in agreement with the scaling relation v m = 2 — 2H W . 
For a more detailed discussion about scaling relations, and memory 
in time series please refer to 19 . 



In conclusion, our analysis of user activities have revealed a uni- 
versal dynamics in online chatting communities which is moreover 
similar to other human activities. This regards (a) the temporal 
activity of individual users (characterized by a power-law distri- 
bution with exponent 3/2) and (b) the inter-event dynamics across 
different channels, if rescaled by the average inter-event time (char- 
acterized by a stretched exponential distribution with just one fit 
parameter). We will use these findings as a point of departure for a 
more in-depth analysis - because obviously the essence of online 
communication in chatrooms, as compared to other human activ- 
ities, is not really covered. From the perspective of activity patters, 
there is not so much new here, which leads us to ask for other 
dimensions of human communication that could reveal a difference. 

Emotional expression patterns. Human communication, in 
addition to the mere transmission of information, also serves 
purposes such as the reinforcement of social bonds. This could be 
one of the reasons why human languages are found to be biased 
towards using words with positive emotional charge 20 . Humans, 
from the early stages of our lives, develop an affective 
communication system that enables us to express and regulate 
emotions 21 . But emotions are also the mediators of our consumer 
responses to advertising 22 , and many scientists acknowledge their 
importance in motivating our cognition and action 23 . However, 
despite the increasing time we spend online, the way we express 
our emotions in online communities and its impact on possibly 
large amounts of people is still to be explored. 

Consequently, we are interested in the role of expressed emotions 
in online chatting communities. Users, by posting text in chatrooms, 
also reveal their emotions, which in return can influence the emo- 
tional response of other users, as illustrated in Fig. 1 A. To understand 
this emotional interaction, we carry out a sentiment analysis of each 
post which is described in detail in the Methods section. This auto- 
matic classification returns the valence v for each post, i.e. a discrete 
value {—1, 0, +1} that characterizes the emotional charge as either 
negative, neutral, or positive. 

Instead of using the real time stamp of each post as in the analysis 
of the user activity, we now use an artificial time scale in which at each 
(discrete) time step one post enters the discussion, so the number of 
time steps equals the total number of posts. We then monitor how the 
total emotion expressed in a given channel evolves over time. We use 
a moving average approach that calculates the mean emotional 
polarity over different time windows. In Fig. 2A we plot the fraction 
of neutral, negative and positive posts as a function of time, for 
different sizes of the time window. While it is obvious that the emo- 
tional content largely fluctuates when using a very small time win- 
dow, we find that for decreasing time resolution (i.e. increasing time 
window) the fractions of emotional posts settle down to an almost 
constant value around which they fluctuate. From this, we can make 
two interesting observations: (i) the emotional content in the online 
chats does not really change in the long run (one should notice that 
times of the order 10 3 are still large compared to the time window DT 
= 50 used), i.e. we observe fluctuations that depend on the time 
resolution, but no "evolution" towards more positive or negative 
sentiments, (ii) For the low resolution, the fraction of neutral posts 
dominates the positive and negative posts at all times. In fact there is 
a clear ranking where the fraction of negative posts is always the 
smallest. Both observations become even more pronounced when 
averaging over the 20 IRC channels, as Fig. 2B shows. 

Our findings differ from previous observations of emotional com- 
munication in blog posts and forum comments which identified a 
clear tendency toward negative contributions over time, in particular 
for periods of intensive user activity 24,25 . Such findings suggest that an 
increased number of negative emotional posts could boost the activ- 
ity, and extend the lifetime of a forum discussion. However, blog 
communication in general evolves slower than e.g. online chats. 
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Figure 2 | Emotional expressions over different time scales. A) Fraction 
of expressions with negative, neutral, and positive emotion values under 
different time scales for one channel. B) Fraction of expressions with 
negative, neutral, and positive emotion values for the 20 IRC channels. 

Hence, we need to better understand the role of emotions in real time 
Internet communication, which obviously differs from the persistent 
and delayed interaction in blogs and fora. 

To further approach this goal, we analyse to what extend the rather 
constant fraction of emotional posts in IRC channels is due to a 
persistence in the emotional expressions of users. For this, we apply 
the DFA technique 18 , to the time series of positive, negative and 
neutral posts. Since our focus is now on the user, we reconstruct 
for every user a time series that consists of all posts communicated 
in any channel, where the time stamp is given by the consecutive 
number at which the post enters the user's record. In order to have 
reliable statistics, for the further analysis only those users with more 
than 100 posts are considered (which are nearly 3000 users). As the 
examples in the Supplementary Figure S4 show, some users are very 
persistent in their (positive) emotional expressions (even that they 



occasionally switch to neutral or negative posts), whereas others are 
really antipersistent in the sense that their expressed emotionality 
rapidly changes through all three states. The persistence of these 
users can be characterized by a scalar value, the Hurst exponent H, 
(see the Material and Methods Section for details) which is 0.5 if users 
switch randomly between the emotional states, larger than 0.5. if 
users are rather persistent in their emotional expressions, or smaller 
than 0.5 if users have strong tendency to switch between opposite 
states, as the antipersistent time series of Fig. S4 shows. 

If we analyse the distribution of the Hurst exponents of all users, 
shown in the histogram of Fig. 3A, we find (a) that the emotional 
expression of users is far from being random, and (b) that it is clearly 
skewed towards H > 0.5, which means that the majority of users is 
quite persistent regarding their positive, negative or neutral emo- 
tions. This persistence can be also seen as a kind of memory (or 
inertia) in changing the emotional expression, i.e. the following post 
from the same user is more likely to have the same emotional value. 

The question whether persistent users express more positive or 
negative emotions is answered in Fig. 3B, where we show a scatter 
plot of H versus the mean value of the emotions expressed by each 
user. Again, we verify that the majority of users has H > 0.5, but we 
also see that the mean value of emotions expressed by the persistent 
users is largely positive. This corresponds to the general bias towards 
positive emotional expression detected in written expression 20 . The 
lower left quadrant of the scatter plot is almost empty, which means 
that users expressing on average negative emotions tend to be per- 
sistent as well. A possible interpretation for this could be the relation 
between negative personal experiences and rumination as discussed 
in psychology 26 . Antipersistent users, on the other hand, mostly 
switch between positive and neutral emotions. 

Are the more active users also the emotionally persistent ones? In 
Supplementary Figure S6 we show a scatter plot of the Hurst expo- 
nent dependent on the total activity of each user. Even though the 
mean value of H does not show any such dependence, we observe 
large heterogeneity on the values of H for users with low activity. 
Furthermore, in Supplementary Figure S7 we show that the Hurst 
exponent of a very active user varies only slightly if we divide his time 
series into various segments and apply the DFA method to these 
segments. Thus we can conclude that active users tend to be emo- 
tionally persistent and, as most persistent users express positive emo- 
tions, they tend to provide some kind of positive bias to the IRC, 
whereas users occasionally entering the chat may just try to get rid of 
some negative emotions. 

This leads us to the question how persistent the emotional bias of a 
whole discussion is. While Fig. 3A has shown the persistence with 
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Figure 3 | Hurst exponents and emotional persistence. A) Hurst exponents (H) of the emotional expression of individual users, obtained using the DFA 
method. Only users contributed more than 100 posts were considered, and we used the exponents obtained with fitting quality R 2 > 0.98. B) Hurst 
exponent (H) versus the mean emotion polarity expressed by individual users, again only from users who contributed more than 100 posts. C) Hurst 
exponents (H) of the emotions expressed in the 20 IRC channels. The values are averages of the Hurst exponents obtained from 10 different segments of 
the same channel, and the error bars show the standard deviation. The horizontal dashed line shows the expected value for random time series (H = 0.5), 
and the gray squares show the value obtained from shuffling the real time series to destroy any correlations. The difference in exponents of the real and the 
shuffled time series is statistically significant with p < 0.001. 
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respect to the different users, Fig. 3C plots the persistence for the 
different channels, which each feature a very different topic. This 
persistence holds even even if we analyse only certain segments of 
the channel, as it is shown in Supplementary Figure S8. So, we 
conclude that the persistence of the discussion per se (which is 
different from the persistence of the users which can leave or enter 
a arbitrary times) reflects a certain narrative memory. Precisely, 
for each chat, we observe the emergence of a certain (emotional) 
"tone" in the narration which can be positive, negative or neutral, 
dependent the emotional expressions of the (majority of) persist- 
ent users. If we reshuffle these time series such that the same total 
number of positive, negative, and neutral posts is kept, but tem- 
poral correlations are destroyed, then the persistence is lost as well 
as Fig. 3C shows. We note that we could not find evidence of 
correlations using the autocorrelation function of the emotion 
time series, while the observed persistence in the fluctuations of 
user emotional expression, as captured by the Hurst exponent is 
very robust. This indicates that the chat community assumes an 
emotional memory locally encoded in the current messages (from 
the user perspective), while the size of the conversation is too large 
to detect it through averaging techniques. 

An agent-based model for chatroom users. After identifying both 
the activity patterns, and the emotional expression patterns of users 
in online chats, we setup an agent-based model that is able to 
reproduce these stylized facts. We start from a general 
framework 27 , designed to model and explain the emergence of 
collective emotions in online communities through the evolution 
of psychological variables that can be measured in experimental 
setups and psychological studies 28,29 . This framework provides a 
unified approach to create models that capture collective 
properties of different online communities, and allows to compare 
the different emotional microdynamics present in various types of 
communication. The case of IRC channel communication is of 
particular interest because of its fast and ephemeral nature. Thus, 
we have designed a model for IRC chatrooms, as shown in Fig. 4A. 
The agents in our model are characterized by two variables, their 
emotionality, or valence, v which is either positive or negative and 
their activity, or arousal, which is represented by the time interval x 



between two posts s in the chatroom. The valence of an agent i, 
represented by the internal variable v„ changes in time due to a 
superposition of stochastic and deterministic influences 27,30 : 

v<= _ y v Vi + b* [h+ —h-) * Vi+A v Ci (3) 

The stochastic influences are modeled as a random factor A v ^ f 
normally distributed with zero mean and amplitude A v , and 
represent all changes of the individual emotional state apart from 
chat communication. The deterministic influences are composed of 
an internal decay of parameter y v , and an external influence of the 
conversation. The change in the valence caused by the emotionality 
of the field (h + — h-) is measured in valence change per time unit 
through the parameter b. Previous models under the same 
framework 27,31 had an additional saturation term in the equation of 
the valence dynamics. This way the positive feedback between v and h 
was limited when the field was very large. But, as we show in Fig. 2, 
chatrooms do not show the extreme cases of emotional polarization 
observed in other communities. Thus, we simplify the dynamics of 
the valence without using any saturation terms, since a large 
imbalance between h + and h- is unrealistic given our analysis of 
real IRC data. 

In general, the level of activity associated with the emotion, known 
as arousal, can be explicitly modeled by stochastic dynamics as well 31 . 
Here, the activity of an agent is estimated by the time-delay distri- 
bution that triggers the expression of the agent, i.e. by the power-law 
distribution P(t) ~ t -1 ' 53 shown in Fig. IB. Assuming that an agent 
becomes active and expresses its emotion at time f, it will become 
active again after a period z. The agent then writes a post in the 
online chat the emotional content of which is determined by its 
valence (see below). This information is stored in an external field 
common for all agents, which is composed of two components, h- 
and h + , for negative and positive information, and their difference 
measures the emotional charge of the communication activity. Since 
we are interested in emotional communication, we assume that all 
neutral posts entered, or already present, in a chatroom do not 
influence the emotions of the agents participating to the conver- 
sation. Thus, the dynamics of the field is influenced only by the 
amount of agents expressing a particular emotion at a given time: 
N+(t) = 2,(1 - ©(-1 * and N_(r) = ^(1-0(5,-)), where 0 is 




Figure 4 | Modeling schema, and simulation results. A) Schematic representation of the model: The horizontal layer represents the agent, the vertical 
layer the communication in the chatroom where posts are aggregated. After a time lapse t, which follows the power-law distribution of Fig. IB, the agents 
writes a post s which implicitly expresses its emotions, v. Posts read in the chatroom feed back on the emotional state v of the agent. B) Hurst exponents for 
the individual behavior of agents in isolation with A v e [0.2, 0.5] and y v e [0.2, 0.5]. Only the exponents derived with fitting quality R 1 > 0.9 are 
considered. C) Scaled probability distribution of the time interval w' between consecutive posts in 10 simulations of the model. Stretched exponential fit 
shows similar behavior to real IRC channel data. 
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the Heaviside step function. Therefore, the time dynamics of the 
fields can be described as: 

h ± = -y h h++c*N+(t) (4) 

These two field components, h + and h-, decay exponentially with a 
constant factor y/,, i.e. their importance decays very fast as they move 
further down the screen (posts never disappear, but become less 
influential). Each field increases by a fixed amount c from every post 
stored in it. The values of the valence of the agents are changed by the 
field components, as described by Eq. 3. In contrast with traditional 
means of communication, online social media can aggregate much 
larger volumes of user-generated information. This is why h is 
defined without explicit bounds. Chatrooms pose a special case to 
this kind of communication, as they can contain large amount of 
posts but limited amount of users. Most IRC channels have technical 
limitations for the amount of users that can be connected at once, 
which in turn is reflected in the total amount of posts present in the 
general discussion. In our model, h might take any value, but the 
empirical activity pattern combined with the fixed size of the com- 
munity dynamically constraints it to limited values. 

Whenever an agent creates a new post in an ongoing conversation, 
the variable, s„ obtain its value in the following way: 

{-1 if Vj < V_ 
+ i if v,>y+ (5) 
0 otherwise. 

The thresholds V_ and V + represent a limit value of the valence that 
determines the emotional content of each post, and in general can be 
asymmetric, as humans tend to have different thresholds for the 
triggering of positive and negative emotional expression. Each action 
contributes to the amount of information stored in the information 
field of the conversation, increasing if s = — 1 or ?i + if s = +1. 

We emphasize that the way we model the agent behavior is very 
much in line with psychological research, where emotional states are 
represented by valence and arousal, following the dimensional rep- 
resentation of core affect 32 . The valence, v, represents the level of 
pleasure experienced by the emotional state, while the arousal repre- 
sents the degree of activity induced by the emotional state, and 
determines the moment when posts are created. Continuously the 
agent's valence relaxes to a neutral state and is subject to stochastic 
influences, as show empirically in 33 . The effect of chatroom com- 
munication on an agent's emotionality is modeled as an empathy- 
driven process 34 that influences the valence. In the valence dynamics 
we propose in Eq. 3, agents perceive a positive influence when their 
emotional state matches the one of the community, and a negative 
one in the opposite case. When a post is created, its emotional polar- 
ity is determined by the valence, as it was suggested by experimental 
studies on social sharing of emotions 26,35 . 

All the assumptions of our model are supported by psychological 
theories. Parameter values and dynamical equations can be tested 
against experiments in psychology, providing empirical validation 
for the emotional microdynamics 28,29 . Furthermore, our model pro- 
vides a consistent view of the emotional behavior in chatrooms lead- 
ing to testable hypotheses that can drive future psychology research. 

We performed extensive computer simulations using different 
parameter sets (see supplementary material for details). By exploring 
the parameter space, we identified which parameter sets lead to 
similar conversation patterns as observed in the real data. We used 
such set to simulate chats in 10 channels, and we analysed the agent's 
activity and their emotional persistence. The results are shown in 
Fig. 4B, C. Specifically, we find that (a) the distribution of Hurst 
exponents for individual agents is shifted towards positive values 
similar to the one observed in real data, this way reproducing the 
emotional persistence of the conversation without assuming any 
time dependence between user expressions. Further, we reproduce 



(b) the empirically observed stretched exponential distribution for 
the rescaled time delays to' between consecutive posts, without any 
further assumptions. 

We do note, however, that the stretched exponent, y = 0.59 (p < 
0.001), of the simulated distribution is different from real IRC chan- 
nels where it was y = 0.21, i.e. there is a faster decay in the simula- 
tions. This could be explained by the fact that in the real chat users 
usually write after they have read the previous post, i.e. there are 
additional correlations in the times users enter a chat. These, how- 
ever, are not considered in the simulations, because agents post in the 
chat at random after a given time interval t, i.e. there is no additional 
coupling in posting times. Following the same approach as we did for 
the real data, we calculated the Hurst exponent of the inter simulated 
event time-series of the discussions. We found that H m ' = 0.75, 
however, we did not observe a power-law decay of the autocorrela- 
tion function (see Supplementary Figure S12). This suggests that the 
observed correlations are due to the power-law distributed inter- 
event times used as input to our model, and it is inline with the above 
discussion about the absence of coupling that also explains the dif- 
ference in the stretched exponents. 

Eventually, we observe (c) the emotional persistence in the simu- 
lated conversations. The mean Hurst exponent for the 10 simulated 
channels is H s = 0.567 ± 0.007, whereas for the real IRC channels H r 
= 0.572 ± 0.021 was found. These results suggests that our agent- 
based model reproduces qualitatively the emergence of emotional 
persistence in the IRC conversation and thus, based on all findings, 
is able to capture the essence of emotional influence between users in 
chatrooms. 

Discussion 

We started with the question to what extent human communication 
patterns change on the Internet. To answer this, we used a unique 
dataset of online chatting communities with about 2.5 million posts 
on 20 different topics. Our analysis considered two different dimen- 
sions of the communication process: (a) activity, expressed by the 
time intervals x at which users contribute to the communication, and 
ft) at which consecutive posts appear in a chat, and (b) the emotional 
expressions of users. With respect to activity patterns we did not find 
considerable differences between online chatrooms and other prev- 
iously studied forms on online and offline communication. 
Specifically, both the inter- activity distribution of users and the 
inter-event distribution of posts followed the known distributions. 
Thus, we may conclude that humans do not really change their 
activity patterns when they go online. Instead, these patterns seem 
to be quite robust across online and offline communication. 

The picture differs, however, when looking at the emotional 
expressions of users. While we cannot directly compare our findings 
on emotional persistence to results about offline communication, we 
find differences between online chatrooms and other forms online 
communication, such as blogs, fora. While the latter could be heated 
up by negative emotional patterns, we observe that online chats, 
which are instantaneous in time, very much follow a balanced emo- 
tional pattern across all topics (shown in the emotional persistence of 
the channels), but also with respect to individual users, which are in 
their majority quite persistent in their emotional expressions (mostly 
positive ones). 

This observation is indeed surprising as online chats are mostly 
anonymous, i.e. users do not reveal their personal identity. However, 
they still seem to behave according to certain social norms, i.e. there 
is a clear tendency to express an opinion in a neutral to positive 
emotional way, avoiding direct confrontations or emotional debates. 
One of the reasons for such behavior comes from the "repeated 
interaction" underlying online chats. As the daily "bump" the activ- 
ity patterns also suggest, most users return to the online chats reg- 
ularly, to meet other users they may already know. This puts a kind of 
social pressure on their behavior (even in an unconscious manner) to 
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behave similar to offline conversations. In conclusion, we find that 
the online communication patters do not differ much from common 
offline behavior if a repeated interaction could be assumed. 

Eventually, we argue that the emotional persistence found is 
indeed related to the nature of human conversations. After all, the 
correlations shown in the emotional expressions of different users 
indicate that there is some form of emotional sharing between parti- 
cipants. This suggests the presence of social bonds among users in the 
chatroom 26 and confirms similarities between online and offline 
communication. 

The fact that we could reveal patterns of emotional persistence 
both in users and in topics discussed, does not mean that we also 
understand their origin. One important step towards this "micro- 
scopic" understanding is provided by our agent-based model of 
emotional interactions in chatrooms. By using assumptions about 
the agent's behavior which are rooted in research in psychology, we 
are able to reproduce the stylized facts of the chatroom conversation, 
both for the activity in channels and for the emotional persistence. 
Specifically, our model allows us to test hypotheses about the emo- 
tional interaction of agents against their outcome on the systemic 
level, i.e. for the chatroom simulation. This helps to reveal what kind 
of rules are underlying the online behavior of users which are hard to 
access otherwise. 

Methods 

Data collection and classification. The data used in this article is based on a large set 
of public channels from EFNET Internet Relay Chats (http://www.efnet.org), to 
which any user can connect and participate in the conversation. Based on the 
assessment of the initially downloaded set of recordings, 20 IRC channels were 
selected aiming to provide a large number of consecutive daily logs with transcripts of 
vivid discussions between the channel participants, measured in number of posts. The 
finally used data set contained consecutive recordings for 42 days spanning the period 
from 04-04-2006 to 15-05-2006. 

The general topics of discussions from the selected channels include: music, sports, 
casuals chats, business, politics and topics related to computers, operating systems or 
specific computer programs. The IRC data set contains 2,688,760 posts. The total 
number of participants to all this channels is 25,166. However, because some people 
participate to more than one channel, the total number of unique participants is 
20,441. On average, the data set provides 3055 posts per day. In the recorded period 15 
users created more than 1 0000 posts. The distribution of the user participation i.e. the 
number of posts entered by every user, is shown in Supplementary Figure SI. The 
mean of the distribution is 97 posts per user, and as we can see from Fig. SI, it is 
skewed with most of the users contributing only a small number of posts. 

The acquired data was anonymized by substituting real user ids to random number 
references. The text of each post was cleaned by spam detection and substitution of 
URL links to avoid them from influencing the emotion classification. The emotional 
content was extracted by using the SentiStrength classifier 36 , which provides two 
scores for positive and negative content. Each score ranges from 1 to 5, and changes 
with the appearance of emotion bearing terms from a lexicon of affective word usage, 
specifically designed for this purpose. Each word of the lexicon has a value on the scale 
of — 5 to 5 which determines the strength of the emotion attached to it. The classifier 
takes into account syntactic rules like negation, amplification and reduction, and 
detects repetition of letters and exclamation signs as amplifiers. When one of this 
patterns is detected, SentiStrength applies transformation rules to the contribution of 
the involved terms to the sentence scores. It has been designed to analyze online data, 
and considers Internet language by detecting emoticons and correcting spelling 
mistakes. 

The perception of emotional expression varies largely across humans, and tra- 
ditional accuracy metrics are not useful when there is lack of an objective space. 
Human ratings of emotional texts have certain degree of disagreement that needs to 
be considered by sentiment analysis in order to have a valid quantification of emo- 
tions. SentiStrength scores are consistent with the level of disagreement between 
humans about how they perceive written emotional expressions 37 . This classifier 
combines an emotion quantization of proved validity with a high accuracy, and is 
considered the state of the art in sentiment detection 38 . Due to the short length of the 
posts in chatrooms, we calculate a polarity measure by comparing the two different 
scores of SentiStrength. The sign of the difference of the positive and negative scores 
provides an approximation to detect positive, negative and neutral posts. The accu- 
racy of this polarity metric was tested against texts tagged by humans and messages 
including emoticons from MySpace 39 and Twitter 40 , which are of a similar length to 
the ones in our chatroom data. The data are freely available for research purposes, and 
are provided as Supplementary Material. Detailed information about their structure is 
provided in the "Data section" of the Supplementary Information text. 

Detrended Fluctuation Analysis. The method of Detrended Fluctuation Analysis 
(DFA) 18 is a useful tool in revealing long-term memory and correlations in time 



series 5 ' 1516 . The method maps the system into a one-dimensional random walk, and 
enable us to compare the properties of the real time series with the time series 
produced by the random case. 

The DFA analysis of a time series x(t) with length T, which can be divided into N 
segments is performed as follows: First we integrate the time series, by calculating the 
profile Y(t) = Y^f [ x (t') ~ <x(t) >]. Next, we divide the integrated time series into N 
boxes of equal length At Each box has a local trend, which in a first level approxi- 
mation, can be fitted by a linear function using least squares. We denote vdXhy^t) the 
y coordinate of the straight line segments that represent the local trend in each box, 
and we subtract this local trend from the integrated time series Y(t). Next we use the 
function 



F(Af) = 



(6) 



to calculate the root-mean-square fluctuation of the integrated and detrended time 
series, and we characterize the relationship between the average fluctuation F(At), and 
the box size At, 

Typically, F(At) will increase with box size as F{At) (At) H , which indicates the 
presence of power-law (fractal) scaling. Therefore, the fluctuations can be charac- 
terized only by the scaling exponent H that is analogous to the Hurst exponent 41 , and 
it is calculated from the slope of the line relating logF(Af) to logAf. If only short-range 
correlations {or no correlations) exist in the time series, then it has the statistical 
properties of a random walk. Therefore F(At) ~- (Af) 1/2 . However, in the presence of 
long-range power-law correlations (i.e. no characteristic length scale) H ¥^ 1/2. A 
value H < 1/2 signals the presence of long range anti-correlations, while a value H > 
1/2 signals the presence of long range correlations (persistence). 
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